Reducing Bias of Allele Frequency Estimates by Modeling SNP Genotype Data with Informative Missingness

نویسندگان

  • Wan-Yu Lin
  • Nianjun Liu
چکیده

The presence of missing single-nucleotide polymorphism (SNP) genotypes is common in genetic studies. For studies with low-density SNPs, the most commonly used approach to dealing with genotype missingness is to simply remove the observations with missing genotypes from the analyses. This naïve method is straightforward but is valid only when the missingness is random. However, a given assay often has a different capability in genotyping heterozygotes and homozygotes, causing the phenomenon of "differential dropout" in the sense that the missing rates of heterozygotes and homozygotes are different. In practice, differential dropout among genotypes exists in even carefully designed studies, such as the data from the HapMap project and the Wellcome Trust Case Control Consortium. Under the assumption of Hardy-Weinberg equilibrium and no genotyping error, we here propose a statistical method to model the differential dropout among different genotypes. Compared with the naïve method, our method provides more accurate allele frequency estimates when the differential dropout is present. To demonstrate its practical use, we further apply our method to the HapMap data and a scleroderma data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of ten SNP Markers for Human Identification and Paternity Analysis in Persian Population

Background: DNA markers are inevitable tools of human identification in forensic science. Single Nucleotide Polymorphisms (SNPs) are one category of these markers which is concerned to use especially in the case of degraded DNA because of their short amplicons. Objectives: Detection of highly informative SNPs by the criteria is the essential step to devel...

متن کامل

Polymorphism in the interleukin-10 promoter affects both provirus load and the risk of human t lymphotropic virus type I (HTLV-I) associated myelopathy/tropical spastic paraparesis

To investigate candidate genes that influence the risk of HTLV-I associated myelopathy/tropical spastic paraparesis (HAM/TSP), we analyzed 6 single nucleotide polymorphisms (SNP) in the interleukin-10 (IL-10) promoter region. METHODS: 280 cases of HAM/TSP patients and 255 HTLV-I seropositive asymptomatic carriers (HCs) from Kagoshima, Japan were studied. All subjects gave written informed conse...

متن کامل

Polymorphism in the interleukin-10 promoter affects both provirus load and the risk of human t lymphotropic virus type I (HTLV-I) associated myelopathy/tropical spastic paraparesis

To investigate candidate genes that influence the risk of HTLV-I associated myelopathy/tropical spastic paraparesis (HAM/TSP), we analyzed 6 single nucleotide polymorphisms (SNP) in the interleukin-10 (IL-10) promoter region. METHODS: 280 cases of HAM/TSP patients and 255 HTLV-I seropositive asymptomatic carriers (HCs) from Kagoshima, Japan were studied. All subjects gave written informed conse...

متن کامل

Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data

Next-generation sequencing (NGS) technologies have become the standard for data generation in studies of population genomics, as the 1000 Genomes Project (1000G). However, these techniques are known to be problematic when applied to highly polymorphic genomic regions, such as the human leukocyte antigen (HLA) genes. Because accurate genotype calls and allele frequency estimations are crucial to...

متن کامل

Allelic and Genotypic Distribution in Single Nucleotide Polymorphism (SNP) G.676A > G of Melanocortin-1 Receptor (MC1R) Gene in Indonesian Goat Breeds

The melanocortin-1 receptor (MC1R) gene has been investigated by many studies regarding the pigmentation variation in various species. In order to determine its allelic and genotypic distribution, we sequenced the goat MC1R gene from 78 individuals in ten populations (Gembrong, Senduro, Ettawa Grade, Boerawa, Boerka, Kosta, Samosir, Muara, Boer and Kacang). Direct sequencing m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2012